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— ABSTRACT: The Tandem NonStop System is a fault-tolerant [1], expandable, and distributed com- 

puter system designed expressly for online transaction processing. This paper describes the key 
primitives of the kernel of the operating system. The first section describes the basic hardware 

__ building blocks and introduces their software analogs: processes and messages. Using these 

primitives, a me chanis m that allows fault-tolerant resource access, the process-pair, is described. 

H The paper concludes with some observations on this type of system structure and on actual use of 

the system. 
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INTRODUCTION 

Fault-tolerant computing systems have been built over the last two decades in a number of places to 
satisfy a variety of goals. The results of these differing approaches have been summarized in 
references 1, 3, and 11 (see page 12). In the past, many of these systems have been designed for 
specific tasks, such as telephone switching, where the costs of failure are si gnifican t. In addition, 
the designers of most of the systems did not intend to provide general purpose hardware and soft- 
ware modules which the end user would customize to form a reliable system. 

An increasing number of online applications in commercial data processing has created a demand 
for fault-tolerant general purpose computing. These applications are also characterized by their 
high rate of growth, which requires that the computing system be si gnifi c an tly expanded over its 
lifetime. The Tandem system is intended to fit these requirements. 

HARDWARE ORGANIZATION 

A network consists of up to 255 nodes. Each node is composed of multiple processor and i/o con- 
troller modules interconnected by redundant buses [2,3] as shown in PMS [3] notation in Figure 1. A 
node consists of two to sixteen processors, where each processor (Pcentral) has its own power sup- 
ply, memory, backup battery, and i/o channel (Sio). All processors are interconnected by redundant 
interprocessor buses (Sipb). Each i/o controller (Kdisc Ksync, etc.) is connected to two i/o chann els 
and is powered from two different power supplies using a diode ORing scheme. Finally, dual-ported 
i/o devices such as discs (Tdisc) may be connected to a second i/o controller. The contents of a disc 
may be "mirrored'* on a second volume, but this function is supported primarily by software rather 
th»p by hardware. I/o devices other than discs are normally single ported and are connected to one 
i/o controller. 

The processors are 16 bits wide with up to 2mb of memory per processor. The internal clock speed 
of the processor is 100ns, resulting in a register-to-register add in .6 microseconds and a load of a 
word from memory in LI microseconds. 

The two interprocessor buses (Sipb) provide each processor with two point-to-point paths to each 
other processor and to itself. Data transfers to and from the buses are buffered in high speed 
16-word packet buffers in each processor, allowing data transfers at a rate of 13J5 megabytes/sec 
This is more than three times faster than the processor's memory, which assures that interproc- 
essor messages are not delayed by the bus bandwidth. 

Hardware Fault Model 

The system design goal is to provide continuous operation in the presence of a single fault. This re- 
quires that all single faults be detectable, diagnosable, and repairable online. In addition, the soft- 
ware must allow reintegration of the repaired module into the system. 

Error recovery anal ysis needs some assumptions to be made about the types of faults that the 
system can tolerate. First, a fault in either a processor, its memory, or its power supply must be 
contained in and therefore at most disable that processor. Second, a fault in either an interproc- 
essor bus or an i/o c hann el will at most disable that bus. Third, a fault in an i/o controller will at most 
disable that i/o controller. With these assumptions, it can be seen that a single fault will at most 
make i/o devices attached only to one controller unavailable. 



Physical events affecting the systems hardware can be divided into three classes [1]. The first class, 

t P ~ wtlYl^Tf ' WlU £ det6Cted 6ither ° n the initiaI *»*«« of the **« or shortly 
thereafter by background teste. These give rise to two kinds of problems: error in recovery 
algorithms and contamination of data bases that takes place before failure detection. 
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Figure 1. PMS Diagram of a Node. 

^^ISTTJS^ u 3hare the P reviousl y ™tioned problem areas. In addition. unles3 
there is immediate detection, there is a far higher probability of data base corruption as the 
background tests are much less likely to see the problem. corruption as tne 

HZd iSSL* Pr ° b !nt' , " d **%? the m0St 3eri0U3 m actual operation, is that caused by ex- 
£^£2? U tb / he "y- TUm da " mdudes such itema » •* conditioning failures; but 

is primarily composed of operational errors by either the computer operator or service personnel. 

^^ST-T" • d ?" f nOTaal °P erationa ' but often their actions are in response to another 
fault at which point a single misstep may cause the entire system to fail. 
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SOFTWARE STRUCTURE 

The operating system [4] provides many of the user services usually associated with medium-size 
systems: multiprogramming, access to a gigabyte of virtual memory per processor, a file system, 
and extensive co mmuni cations facilities. However, it offers these services in a fault-tolerant man- 
ner -on a modular, expandable computer system. To do this, its structure has been designed as an 
analogue of the hardware structure. As the hardware consists of. multiple processors and i/o con- 
trollers, the operating system functions are distributed over multiple processes; and as the hard- 
ware modules are interconnected via redundant buses, the operating system processes com- 
municate via fault-tolerant messages. 

At the time this structure was proposed and development started, December 1974, there were few 
precedents for it. Our main inspiration was found in the work of Dijkstra [5] and Brinch Hansen [6], 
whose ideas and examples provided the key kernel primitives and structure for the system design. 

Processes 

Each processor supports up to 256 concurrent processes. Each process has a private data space, but 
may share code with other processes in the same processor. A major portion of the cost of a process 
switch is chargeable to memory mapping which is required to designate the new process's code and 
data spaces. This results in a cost of approximately .5 milliseconds to switch processes. 

All processors contain both a monitor process and a memory mana ger process. The Monitor's func- 
tions include process management within its processor (e.g., process creation and deletion), infor- 
mation return, message system control, and fault recovery. When a process is created, it is given a 
unique "processid" composed of two parts. The first is its location (node number, processor number, 
and process number) and the second is either a unique timestamp or a symbolic name. 

Process synchronization primitives include counting semaphores and process-local event flags. 
Semaphores are used within the kernel to control such things aa access to shared i/o controllers. 
Event flags are used to signal a process that events such as device interrupt, message arrival, and 
message completion have occurred. These primitives were chosen using the author's experience at 
the tim e rather than following an exhaustive survey of available methods. They have been more 
than adequate for the resource control involved in i/o operations and in the implementation of the 
message system. More complex resource control is handled by requesters sending messages to the 
process which "owns" the resource. 

Messages 

Almost all information flow, even within a single processor, is carried in messages rather than 
through shared storage. Each process has a message queue where all messages sent to it by other 
processes are placed. Messages are queued in either a FIFO manner or according to the sender's 
priority, at the receiver process's option. The message system is designed to provide a process-to- 
process communication mechanism which is independent of the location of the processes and 
transparent to interprocessor bus transmission errors. 

A message consists of a request for a service and a reply by the server. For example, if a process 
wishes to create another process, it calls the procedure NEWPROCESS which in turn sends a message 
to the Monitor process in the processor where the process is to be created. The sender will then 
wait for the message reply. When the Monitor retrieves the request from its message queue, it 
creates the process, and then replies to the message (which awakes the sender) with either the new 
processid or an error indication. 
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This model is similar to that discussed by Cheriton in reference 7 and has the desirable property 
that it is an analog of an operating system procedure that is called by the application program to 
perform a specific service and return a result. Inherent in it is a positive acknowledgement for each 
logical request, which is the first key to providing fault tolerance in this system. 

In addition, the application program is not allowed direct access to the message system. This is done 
by providing a conventional user/privileged mode in the processor. User programs may only enter 

»r!ZT * T e £- (and JE**** mode) at <*«** defined entry points" e.gT^wPROCESS in the 
previous example. This restriction provides obvious protection and information hiding benefits and 

» 2H- t 7F?"* 37Stem *° C ° ntr0i err ° r reC0Vei7 ° n BW8 » B » £ailures ' a second key to the 
system s fault tolerance. J 

Message primitives 

^!j he A err0r reC ° Ve 7 strate 8 ies «« to sh °™< the kernel's message primitives must be in- 
troduced. A message exchange using these primitives takes the following form: 

tZ?ZZ* T' thC re i ? UeSt ° r ' sends a measa 8 e to P™*» "S". the server, by calling the procedure 
UNK. The caller supplies a processid, six message parameter words, and an optional resident buffer 

lELZZZT* fT^ ^"^ to the me - a ** or may be used to'return a result If the 

SZSEJ^t-Ti f I 83 ° f the 3ender ' 3 Link Contro1 Block (LCB > * turned to R, a 
matching LCB (which contams the six message parameter words, R's processid. S's processid, tnd 

quesfp^Sng) " qU6Ued ° D ^ meS3age qUe ° e ' aad S ta awakened ° n the event lreq (re- 

When S is ready to process a request, it calls the procedure USTEN. which dequeues the first LCB 

te fiSrffaTSl ^S^ LCB COntaiDS ^ * meMage P™»*« "d the size of the 
brf«2r~^ caUer aapphed to.UNK. If there is any data that must be retrieved from R 
Sli,ST * P 6 ^™^ S «"■ the Procedure READUNK with the addresses of its data 
buffer and LCB. readunk copies the data from R's buffer to S's buffer. 

wr^^JS? £" ^ 0perati f n aad returns the re!ralt3 ^ R by calling the procedure 
rSorTfhiT^ll ? ^"P™* k LCB «i optionally, a buffer whose contents are to be returned. 

^Itl L7H, h meSSage Parameter WOrdS fa itS LCB t0 retBra status to R. 
TOTEUNK causes Ss buffer and message parameter words to be copied into R's buffer and LCB. A 

done flag is set in R's LCB. and then R is awakened on the event LDONE (request complete). 

tLStZ JfU S fa fi ^ ,hed J wi * h *• messa 8 B •*» may go on to other things. R can examine the 
resuUsJUbe request and then return its LCB and buffer (if any) to the system by calling 



Message properties . 

The previous example illustrates several key aspects of the message system. First, all messages are 
sent by value. By having no shared data structures, the message system looks the same irrespective 
of the relative locations of the processes R and S. 

Since information transfer between R and S- only occurs via message system primitives, which in 
turn only work when the supplied LCB's match (processid's for R and S and R's LCB address are iden- 
tical in both LCB's). the message is always abortable by either R or S. The requestor, R, can ter- 
minate the message by calling BREAKLINK. If the message has not been completed, a cancel flag will 
be set in S's LCB and S will be awakened on the event LCAN (request cancellation). The server, S, 
may also ter mina te the message by setting the cancel flag in its LCB and calling WRITELINK without 
specifying a buffer. This will result in the requestor being awakened oh LDONE with both the com- 
pletion and the cancel flag set in its LCB. 

Completion of a message with the cancel flag set provides a uniform me ch a nis m for sig nalin g 
failures. It allows outstanding messages to be cleaned up on a process or processor failure by set- 
ting the can cel flag, mimicking cancellation by the failed end of the message. 

Certain system status messages that need no reply, such as processor failure or reload, are sent by 
system processes to application processes. It is important that the system process not be blocked if 
the application process does not pick up the message. Hence the sender calls BREAKLINK immediate- 
ly after calling LINK, ter minatin g its end of the message. This technique can only be used when the 
information can fit in the six message parameter words. 

A server process is free to pick up multiple requests through LISTEN, queue them internally, and 
not HEADLINE or WBITELINK the request until it is ready to process the request or reply to it. For ex- 
ample, a process controlling a disc may need to block certain requests that need to access a locked 
record until the record is unlocked. 

Conversely, a process may have multiple UNK's outstanding at any time. This allows an application 
process to keep a read request posted for each terminal that it manages, processing the input data 
as requests complete. 

Message system resource control 

In a message-based system, allocation of control blocks and buffers is a potential source of resource 
contention and deadlock. No formal limits on process interaction via messages have been defined. 
This approach allows flexible application design, but at some risk. The resource allocation 
strategies that have been devised are not "correct'* in a formal sense, but they do minimize this 
risk. 

First, LCB allocation is controlled by providing both "reserved" and "pool" LCB's. A process may 
reserve some number of LCB's to queue incoming messages and some number of LCB's to send 
messages. If a process has all of its reserved (possibly none) LCB's in use, then pool LCB's will be 
allocated if they are available. If an LCB cannot be obtained within 10 seconds, then the call to LINE 
will fail System server processes reserve one or more LCB's for incoming messages and a sufficient 
number (dependent solely on the servers' needs during request processing) for outgoing messages 
to assure that they can complete any request made of them. 
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Message buffer aUocation is managed by several techniques. First, data buffers for incoming 
messages are not allocated until the server process is ready to request the data v^SlSTsS 

bSefr^ Ztt° mdi f erent St ° rage POOlS ° D the basis 0f the *Pe of ^XSn e , 
buffers for reads from terminals are separated from buffers for disc requests since these two buffer 
types are usually held for significantly different periods of time. In acSLn ce^Sn ty^esTf sysUm 
server processes have permanently allocated buffers so that they may always service Quests 

Interprocassor bus protocol 

^Lr^^"^ ' W ° Pr ° CeMeS fa "" same P™*°° r "» >» *■» «^S standard mutual 
SST£Sli?tr* ne35igeS b «-».P— • «• «— »£«. muS « flow 

In order to make messages useful as an abstraction, it is important that the message nri m i*,w 
recover from bus errors and fail only if the other process doT^t e^ Se ot^pro^or is 

£?• i J*" •™ J a ° *** LCBS - *» additi0n - l0cali2ed error detecti "° andeorrection is r^S for 
feult isoktion and repair. These arguments for local robustness should noToe htter^rTed L 

3Z32SZ ^ ****** " T 0Urag6d by Salt2er ta N. 0n th « contra^XTuthot 
oeuevea that they are both necessary m fault-tolerant systems. 

From an implementation and confidence viewpoint, it is desirable that the error recovery scheme 
lS«T h P " P ° SSlhle 7Ct 3tm deteCt miarouted - ^^d' ° r to* packets. In addTJonTthe two 

ssssr^t ss^r*- 7 oumber of iogicai con ^ h — ~ - s 

tie otZZ led ^! WA ^ "*• * the "*»« - still on the WACK list after one JSTlt^St on 
oroeetL S T^L^ 6 "£"* ■"* • M » p the transmission is acknowledged or the SS^S 
processor is declared down. Repeated failures to acknowledge transmission over Thus to^SEJ 
pressor cause that path to be marked as down. The sender processor may send up toTee" gtcal 

ttEZtZ S 55SSS l08iCal •-*'■ AdditiODaI ^ *» * ~S 

which includes a buffer address, a transfer count, and the next expected sequence number When a 

^computed, and the checksum and the sequence number verified. A good packet^es the BOT e^- 

Zte^lZ^ '"I? ^ tranSfer "^ beC ° mes Zero or a PackeTerror o<xuSaT interrupt b 
*T££ g * ""V? 1 - error ' the P~- « °e«d only note the type of error anTflush 

the packet; error recovery is the responsibility of the sending processor. 

^^iTSSJ l^T reC ° Ver7 mechania,n * th « *»« tost to timeouts or packet flushing. 

SS S occult;. ^ erT ° r *S* " VCr7 l0W: em,rS «" on * observed ^ h « * ba^are 

toult has occurred. For example, a month's error log for the system that this paper was prepared™ 

s^hetT f Q ° err ° r me3SagBS - GiVen this err ° r rate ' the correctness of The elr Tve" 

scheme is far more unportant than its efficiency. recovery 

a^es^Fu^ ^^-to^ P^tocol is the action taken when a server. S, writeunKs data to its re- 
to£ TW S 75 u 13 qUCUed ° D thC SeDd Data List (SDL) ' ««» a se °.^<* number assigned 

Srf b2S h r P ! nde ^; th , B reqUeSt " added t0 the WACK **• ^ • here's data back (PjS Con- 
trol packet and the data block packet(s) are sent to R's processor. 
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R's processor sees a bus receive interrupt for the PHDB packet, sets the BRT buffer address to R's 
buffer, sees a data completion interrupt after the data block packet(s) have been received, and then 
queues an acknowledgement for S's processor. The acknowledgement is either sent as an unse- 
quenced control packet or piggy-backed on some control packet that is part of another request. 
When the packet holding the acknowledgement arrives, S's processor completes the request and ac- 
tivates S. 

Every second, each processor sends an unsequenced acknowledgement packet over each bus to 
every processor. This packet has two purposes: to recover from lost acknowledgements and to tell 
the other processors that this processor is up. Every two seconds, each processor checks whether it 
has received an unsequenced packet from each other processor. If not, it considers that processor to 
be down, and cancels all messages from it as described earlier. 

Message system performance 

Sample program segments and performance data are shown below. Neither process does any proc- 
essing on the message, and message buffers are preallocated. The requestor process executes: 



while true do 
begin 

UNK(_); 
wait for message completion; 

BREAKLINK( - ): 
end; 






and the server process executes: 






while true do 
begin 
wait for a message; 

LISTEN; 
HEADLINE! _ ); 
WRITEUNKt - ); 
end; 






with the results: 






HEADLINE 
(bytes) 


WttlTELINK 
(bytes) 


Elapsed time/msg (msi 
intracpu intercpu 






200 

2000 




200 



2000 


2.1 2.6 
ZZ 2.9 
2.4 4.2 
4.6 7.0 



The asymmetry between the second and the third example occurs because a WRTTELINK must 
always be performed to complete the message, but a READLENK is only performed when data must 
be moved from R to S. 
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PROCESS-PAIRS 



Processes and messages provide a method for hiding processor boundaries and inter-processor bus 
errors Associating a process with a resource provides a method of addressing anTac^LS'tThl 
aext step is to budd a protocol that provides fault-tolerant access to th resource ^ protectine 
against information oss due to a n™-c«™. «.n.„. resource, protecting 



against information loss due to a processor failure. 



process, requests are sent to the "primary" process of the "process-pair", which handles the ™>- 
quest and controls the resource. When the primary receives a request for^n opemio^uS as a fSe 
open or dose, the primary process "checkpoints" the request* the "bacL? proSsT via £ 
message system. These checkpoints ensure that the backup process has all infoLatloTthatl S 
need to assume control of the device in the event of an i/o channel error or a S? of the prLSy 
£££' Pr0CeSS ° r - Wh8 " thB Primar7 **• the ^P I"— -*»»« o-er" an |ZTS 



primary 



Sle I^JT **£? meSSagCS ? P rocess -P^ « follows. Each processor maintains a name 
table associatmg a symbolic name with two processids in the node. When a message TsentTa 
named process, the first processid of the pair is used. If that process does not exisTS* not cun4nt" 
ly the primary process, then the message fails; the processids in the table are «chan\ed ^nTthe 
message is resent to the other half of the process-pair. exchanged, and the 

Error Recovery Using Process-Pairs 

Unfortunately, not all requests for service are arbitrarily retryable. For example writi™ a «™rW 
Snee I^SSL'^T^ * dUpUCate ^ m to be ^~ > * tnTretrdlSy 32 

=rs^^^^ — 

For example, let R and R' be primary and backup requestor processes and S and S' be primary and 

an^hVLT P, Sr eS -, ^ UMCe aUmber b kept for Opener of a file o7both t^^r 
wtn R tTh , I' 116 ^ i ^^ the ^tial state has aU sequence numbers eq^dTze^ 
When R w«hes to wnte a record into a disc file controlled by S. R sends S a message: 

S picks up the message and performs the following check to see if this is a redundant operation: 
(2) if requestor seq < my seq then return saved status 

SreTisle 0f s^Z 3 H StatUS *"* ""f * "^ h « " dependeQt U P° n the P™*~ * Evolved. 
2^ £ re^ntw^Tprnel 11 ^ *" " * *«* "* «— * ■"*» ™»* 

fh™!f I f ° r ^^ reSUlt MS ' tbeD the °P«*» is performed. S reads the disc block, checks if 
the record already exists, and then checkpoints the request and the new block to its bac^p S\ 

(3)R' seq=0 R Seq=() S - S' 



seq=0 ~ °seq=0 
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The block is written to disds), the completion status is saved and checkpointed to S\ and both S and 
S' increment their sequence number. 

MR'seq-o R seq = S seq = l "" S' 3eq _i 
The result is then returned to R, who also increments his sequence number. 

^JR'seq-O Rjeq-l "~ ^seq^i S'^.j 

Finally, R checkpoints the result to R", who increments his sequence number, returning the system 
to a state indicating that there are no requests in progress. 

(6)a *seq-l "" R seq»l S seq-1 S 'seq»l 
Failure recovery during a request takes the following forms: 

First, R' or S' may fail during the request without affecting the operation, as it can be carried out 
even if the checkpoints faiL 

Second, if R fail« following step (1), then S performs the operation, but is unable to return the result. 
When R' becomes R, it repeats the request starting at (1); but since its sequence number is still zero, 
the test in (2) returns the result that would have been returned had R not failed, and the operation 
is not repeated. 

Finally, if S fails during the operation, S' becomes S and either does nothing or completes the opera- 
tion using the checkpointed information, saving the completion status and incrementing its se- 
quence number. When R resends the request (1) to the new S, it either does the operation or returns 
the saved result. 

It should be emphasized that this checkpoint and error recovery process is independent of the loca- 
tion of the processes, the number of processors in the system, and the other message traffic in the 
system. By confining the recovery actions to the processes directly involved, the mechanism is both 
simple and arbitrarily expandable. 

Process-Pair Maintenance 

Process-pairs for i/o processes are created at system configuration time. A memory image for each 
processor is created with all necessary data structures. At initial load time, a processor is loaded 
from the disc Each configured process will test to see if its other half exists, and observing that it 
does not, it will become the primary. An initial command interpreter will be created by the Monitor 
process which can then be used to issue RELOAD commands to load the rest of the processors. 

When a processor is reloaded at an initial load or following repair, the configured processes will 
start execution. Each will observe that the other half of its process-pair is up, and therefore it is the 
backup. While this is going on, the process which reloaded the processor will be notifying all in- 
terested processes in the system that the processor is now up. The primary of each process-pair will 
then checkpoint its current state to the backup. 



"" Application process-pairs are created in a more general manner. An initial primary process will be 

created in a processor. The process will in turn select some other processor and create its backup 

^ there. If the primary fails, then the backup will take over and it may select another processor to 

— ' create a new backup, or, as is usually done, it will wait for the original processor to be repaired and 

reloaded before creating a backup. The primary of any process-pair may switch roles with its 

H backup. This is generally done to balance the load on a system following the repair and reintegra- 

_ tion of a processor. 
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OBSERVATIONS 



The system does tolerate faults, they are repaired online and reintegrated into running systems 
More speafic rekabdity and availability statements are best left for ouTusers to MST 

System performance is likewise difficult to comment upon. In the example of section 3.1. the process 

A system structured around process-pairs communicating via messages has already faced some of 
v"LTf° r , Pr0W T mvoIved . in diluted computing systems: Centralized control partial 
system failure, and reintegration of repaired components. Two logical extensions TSe sVsSm 
have been done to allow distributed systems to be connected into a Sw,Sntatad^»rf72S 
specter m processids and the extension of message destinations to Suae processes in other 
nodes with commumcation provided by more conventional data communications equipment 

3 ?mel V ^TrilT a d nH d ? d r t f f S ° f "d****"*™ ^P«ter systems have been explored in 
££ «!^ + ^ [ iwu need "f be re3tated here - Ho ^ v ^. there is no question that there is som" 
m^ZT « ^ ^ °~ meSSage 3ystem rather th ** s^ memories methyl Z£l 

L"beTd"ni e cl^r h t t ther T™ ^ WritC 4PPliCati0n P™*^*"- Many do. and if the design 
variou? te^ouT^'h 2LT?- "?* OTrreCtl7 * 0Bl " b » to **■■ A *•«" discussion of the^ 
^Zl^ q for checkpointing « presented in [10]. However, the use of higher-level tools for 

JSSSf^- ^ hird P ^ blem " DOt the handKa ^ of the ** tolerance, but the design of a 
mZ?£l^ e 7 SUim ' MaD7 Ufler ' 3 ^ reaCtion to thia d ^8» P"«*» - * Sr to Sate a 
sfaon SSSt 3b ?" g T to d ° •" fUnCti ° ns - Such a """ ofter« from a batch daL proS^me 
shop which is attemptuig to develop its first online application. This approach may geTeariy^sX 
but the program will be very hard to maintain and it will not scale up as it canno?be Tnfi/aSS 
EHlS"""?" f ° r U?her thrOU ^ fa P ut - *"*»« reaction <w£ sometuneTSuowl the fiSS 
m^ rmaut a Thtt 1 °r t0 "TS" 'T* 93 " ^ *" mUat »» »* * P™JS»\S2^ 

cTt anTolSSon w^?^^ ^ * ^ "^ ke ^ uenced *** base. The result of thL 
can oe an application which handles two transactions per minute. 

iSSS^t d8Sigl l. and sy! * eni sizin S are still art rather than science. We have developed tools and 
guidelines for application design, for simulation and modelling to aid in system SSlTdW 
measurement of the resultant system [14]. These are extenfively usedTde^W l^s for 
systems and tend to eliminate disappointment upon system delivery aevel0 P m S W ot ** for 
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Transaction systems have needs that differ somewhat from those of general purpose computing 
systems. The primary function of a processor in a transaction system is to move data between discs 
and ter minals , with a surprisingly small amount of processing done along the way. The high 
demands for message and i/o buffers may force the purchase of additional processors to gain buffer 
address space, rather than to increase processor power. 

Therefore, extensions to the system architecture to improve memory access can be more useful 
than reimplementation of the hardware for greater speed. The former direction has been taken by 
our NonStop II system, which is a compatible extension of our 16-bit architecture to provide a 32-bit 
logical address space. This additional addressing capability eliminates many processor limitations 
that could be traced to small address spaces. 

The decision to build a complex system based entirely upon processes interacting via messages has 
been the proper one for us. It has allowed us to construct reliable transaction systems differing in 
cost by an order of magnitude which can be constructed with the same modules and run the same 
software. It has also provided a flexible base on which to do significant software extensions without 
major redesign of existing system software. 
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